New York City: Political Finance Analysis

Miles Bartnik

13612130

37373 Programming for Data Analysis

Autumn 2020


Abstract

The main goal of this project is to explore the influences and outcomes of financing in New York politics over the most recent election cycle. This will be achieved by exploring the relationship between political donations and contracts awarded on behalf of the NYC government.

The exploratory data analysis will focus on the geographic distribution of donations and contracts awarded to determine the regions with the most interest in the NYC government and how that interest is rewarded. This project will also determine the characteristics of those who are donating directly to NYC politicians.


1. Motivation

I have always had a passion for politics, and how the system operates outide the public view. You can often gain much greater insight into the motivations and actions of a politician by understanding their support base, their fundraiser attendees, and their factional powerbrokers. In particular, I have always wanted to determine what effect, if any, the American culture of political donations, fundraising events, and Political Action Committees (PACs) have on American politics.

There is a perception of politics that being an insider to the process grants certain benefits elsewhere when it comes to how the government of the day will remember who put them there, whether it is by certain policies or directly awarding their supporters lucrative contracts. I want to determine if this belief has any basis in reality by exploring the relationships between campaign contributions and contracts awarded on behalf of New York to see the extent of these relationships or if they exist at all.

I also want to gain a better understanding of the NYC political landscape, who the important members are on both the political front and the fundraising scene, and look at what implications this may have for the upcoming 2021 NYC elections.


2. The Data

For this analysis, we will be using two datasets from the NYC Open Data portal.

Dataset Name Link Rows Columns Description
2021 Campaign Contributions 2021 Campaign Contributions 84800 52 Contributions to 2021 NYC election campaigns
Recent Contract Awards Recent Contract Awards 35600 37 The Office of Citywide Purchasing (OCP) solicits and awards contracts for a wide variety of goods and services on behalf of all City agencies


3. Data Preparation

Non Standard Dependencies

There are some non standard imports this notebook makes use of that will need to be installed through the following terminal commands. They are as follows:
geopy (Address Processing)- pip install geopy
ipyleaflet (Mapping)- pip install ipyleaflet
Folium (Mapping)- pip install folium
branca (Color Mapping)- pip install branca


Importing The Data

The data dictionaries for NYC Open Data datasets are .xlsx files with multiple sheets. To obtain the data dictionary on its own, we must strip this sheet out of the .xlsx file


Formatting the Campaign Contributions Data

We will start with the formatting the Campaign Contributions data (in code as donation_data). From the data dictionary we can determine which columns will not be required for this analysis.

We are focusing on the assignment of contracts to businesses. To filter these donations, we will use the C_CODE column. We will keep both sets of address data for our analysis. We will also keep the donation purpose column as well, as it may yield interesting results for how the money was intended to be spent.

From the data dictionary, the OFFICECD column (representing office sought) is supposed to number 1-6. However, as can be seen in the OFFICECD column, there are entries with double digits. This is clearly a clerical error and one which we can rectify using a regular expression.

We also need to prepare the Occupation data by ensuring they are all in the same format. This will prevent issues with doubled occupations due to case sensitivity.


Campaign Contributions Data for Geographic analysis

For the geographic analysis, we need to convert all street addresses into a format that our geocoder, Nominatim, can handle.


Extracting Geodata

Now that we have a complete string for the address data, we need to create a function that can obtain coordinate data for our geographical analysis of the data. This will require geopy and Nominatim to convert our address data into usable geodata. If Nominatim cannot determine a location from the address provided to it, we will store it in a 'broken_addresses' array which we will attempt to repair using regular expressions.

Example

Capturing The Data

Repairing Broken Address

Testing New Address

Importing Preprocessed Address Data

The number of unique addresses is far to large for this notebook to process in a reasonable timeframe. Instead it has been preprocessed and will be brought into the notebook here.


Mapping to Old Addresses

We now need to add this information to our existing DataFrame. We will need a mapping function to map each unique address from the geodata DataFrames to their corresponding addresses in the master DataFrames, which may have many instances of each address. To ensure assignment occurs correctly, we will have to map off the original address. We can then create the new Geo_Address column, which will be consistent for every entry and provide us with more information about the original address.


Repeat for Contract Data


4. Exploratory Data Analysis

Now that we have our data prepared, we can start exploring it to gain a better understanding of NYC political finances.

Donation Proportions by Office

First, let's see the proportion of donations made on the basis of what office the candidates were running for.

We can see that City Council donations account for almost half of all donations made, though Undeclared has a large impact on this. It may be possible that New York has many more categories of government office or these donations were filed incomplete.

Total Donations by Office

We will now look at the total value of donations recieved by the office the candidate was running for.

It seems that the most significant office to influence in New York is the City Council, with the majority of funds raised directed towards it. The New York City government is a mayor-council government model, where the City Council exists to make land use decisions, monitor the performance of its subordinate government agencies, and approve the city budget, which it has unilateral authority to do so. It would be the ideal office to donate to if your aim was to influence the policies and budget of New York.


Average Donations

Average donation would also be an indicator for the kind of financing these offices have access to.

Despite their low number of donations, Public Advocates have a much higher average donation. This is likely due to public advocates being a lesser known office and these are more likely to be self funded contributions.


Occupation of Donors

What kind of person is most likely to donate to New York politicians?

We can see that higher net worth individuals are more likely to contribute to the political process. This is intuitive, as these people are more likely to have enough disposable income to make these contributions. They also have more of an incentive to contribute than most, with business leaders more likely to want to influence legislation in their favor. Homemaker makes an interesting appearance at the 3rd highest amount donated, though I suspect this is likely to be members of the same family making two donations to contribute more money.


Amounts Donated by Top 10 Occupations

Let's examine more closely the donation habits of those in the top 10. First we need to account for people making multiple small donations so we will sum all donations made by Donor Name and then group them by Occupation.

Now we can plot the data on a boxplot.

Entries that stand out to me from this boxplot are CEOs, Executives, Presidents, and those working in Real Estate. These entries have the longest whiskers indicating that more of their donations made are towards the higher end of the donation threshold. They also record the highest amounts by individual donors, with their highest donations nearing or surpassing $10000


Top Donors

Let's examine the characteristics of the highest contributors and determine their likely occupation.

Again, we see the same occupations being represented in the top 1000 donors in almost the same order they were in on the total contributions chart.


Top 10 Donors

Let's examine the people at the highest levels of contribution.

Below is a breakdown of who these individuals are.

Contributor Background
Daniel Brodsky American real estate developer, art collector, and chairman of the Metropolitan Museum of Art
Andrew Albstein Managing partner at Goldberg Weprin Finkel Goldstein LLP, A New York law firm focused on real estate.
Kenneth Fisher Billionare founder of Fisher Investments, a financial advisory firm
John S Klein I could not find any concrete information on John S Klein
Murat Guzel CEO of Nimeks organic/Natural Food Source Inc
Diana Boutross Executive Managing Director of Cushman and Wakefield, a real estate firm
Arthur Zeckendorf Third Generation Co-chairman of Zeckendorf Development, a family real estate empire
George Tsunis Chairman and Chief Executive Officer of Chartwell Hotels
Tumay Basaranlar CEO of Atlantis Management Group, a petroleum distribution company
Jose Montero President of Atlantis Management Group, a petroleum distribution company


Donations for 2021 Election Cycle over time

We will now examine how the funds for this election cycle were raised over time, and try to determine the peak fundraising periods. I am assuming that all donations declared in this dataset were intended to be spent on this election.

Donations for the 2021 election cycle pre 2018 are almost non existent. From 2018 onwards, fundraising occurs every 6 months in January and again in July. As the election period approaches, fundraising increases almost exponentially. I expect those candidates that had fundraising apparatuses in place were able to raise donations on a consistent basis, while less established candidates waited until the election period was upon them to raise donations.


Amount of Donations over time

We will now examine the value of these donations raised over time.

It seems donations made outside the official fundrasing periods are higher than would be expected. These donations may be associated with officials making contributions to their own campaigns or arranging donations in some other way with higher net worth individuals.


Donations over time by Office

Let's break the donation time series down by office.

As discussed before, if you want to influence New York policy you need to influence the City Council and this figure shows it clearly. The City Council dominates in funds raised closer to the election. What is interesting is that the Mayoral donations appear to trail off closer to the election cycle. It is possible that these donations will pick up closer to the NYC mayoral elections occuring in November 2021.


Top 10 City Council Members by Donations

Let's examine the City Council members with the most donations.

The highest number of donations went to Mark Gjonaj, a Democratic councilor representing district 80 of the New York State Assembly, encompassing Morris Park, Pelham Parkway, Pelham Gardens, and Norwood, and other communities in the Borough of the Bronx.

Top 10 Mayoral Candidates by Donations

Let's do the same for Mayoral candidates.

The Mayoral donations are dominated by Ruben Diaz Jr. He is currently serving as Borough President of the Bronx. It is interesting to note that the highest donation amounts raise both went to candidates responsible for the Bronx region of New York.


Mayoral Donations over time by Candidate

Let's examine any trends in the timing of donations, to see if there is a challenger emerging against Ruben Diaz Jr.

Among the top 2 candidates by donations, it would seem that Ruben Diaz is on a downward trend while Dianne Morales has no donation history before July 2019. Further research has determined that Ruben Diaz dropped out of the Mayoral race in January 2021, saying he wanted to spend more time with his family.

NYC Government Spending

Let's now examine the Recent Contracts Data to see if there is anything interesting in New York City expenditure.

When I plotted this for the first time, I jumped out of my chair. NYC has a contract totalling more than $11 billion US with a single company. I'm still not exactly sure of the circumstances behind this contract, as it appears to be a legacy contract to be paid on a continuous basis. I believe COVANSYS is responsible for all of the NYC government Software and Technology infrastructure, as that is the only explanation I can find for how large the contract is.

That is an absolutely staggering amount of expenditure.


Folium Heat Maps

In order to produce meaningful heat maps with the data, we need to be able to set up a legend to appear on the map, as folium does not have this by default. To do this, we are going to extract the minimum and maximum values from the donation amounts and contract amounts.

Observing the global heatmap more closely, we can see that a higher than expected amount of addresses are falling outside of the United States. This is likely due to incomplete address data that Nominatim has returned the most likely address it has for that location. It is unclear to what extent foreign businesses do contribute to New York politics, but most of the addresses outside the US bear significant resemblance to places within the US, suggesting this is a result of incomplete address data. To improve the integrity of this analysis, we will only focus on addresses within the US.

This is much more in line with expectations, with donations mostly coming from major US population centres, which tend to contain higher net worth individuals and political insiders more likely to contribute to the political process.


Acquisition of NYC Contracts

Here we have done the same for the NYC Recent Contracts data.

Close to the centre of the map, in Kansas City, there is an extremely bright spot on the heat map. This bright spot is 7701, College Boulevard, Overland Park, Johnson County, Kansas, 66210, United States, the previous headquarters of the COVANSYS Corporation. This office accounts for almost half of all NYC government expenditure.

Aside from this, we can see the distribution is fairly similar to the distribution of the donations, but it is difficult to tell if this implies a causality relationship or if this is simply due to the population distribution of the US.


Choropleth Log Donations

The heat map of donations can be fairly cluttered and difficult to read at times, so we will simplify it into a choropleth plot. As the amount of money donated from within New York vastly exceeds the amounts donated from other states, we will plot the base 10 logarithm of the donations and view the difference in orders of magnitude.

For Folium to interpret this data correctly, we require the USPS state codes that will map to a geojson file of the us states that will be imported later. We need to ensure that all states have a value associated or Folium will not be able to process the missing values.

We can see that the choropleth map brings us to the same conclusion as the heat map, that political donations are mostly made from states with major population centres. There seems to be no distinction whether a donation comes from a so called "Blue state" or "Red state", it is all dependent on the concentration of the population.


5. Results/Insights

Political Donations

The donation cap of \$1000 for individuals who were not candidates seems to have smoothed contributions for individual donations. Even at the upper end of the donations by occupation, most people contributed less than \\$1000.

From the Donations over Time graph, I can see that fundraising cycles occur every 6 months, and the amounts raised are increasing at a positive linear growth rate. As this dataset deals with donations for the 2021 election year, this is likely due to more established candidates having a more consistent and long running fundraising apparatus, accounting for the growth in donations up to the election period. During this period, members of the public outside of this apparatus are more likely to make donations, and smaller independent candidates will also solicit donations closer to this period. We can observe from the US data with an associated employment address, most donations to NY government officials come from within New York. There are significant contributions from Seattle in Washington State and small pockets over other cities within the US. The donations were more evenly distributed over New York than I had expected, though I believe this is due to the donation cap of \$5100.


Contract Awards

Interestingly, the geographic distribution of NYC contracts is much more decentralised than I would have anticipated. The major exception to this is the COVANSYS contract, a Kansas based company that was taken over by a larger conglomerate. What appears to be a legacy contract dwarfs all other New York government spending, at a cost of more than $10 billion US over the life of the contract. This datapoint is so large it completely skewed my analysis. While there is evidence of a trend where interstate contractors from Democrat run states appear to be favoured than those in Republican run states, this is more likely due to the concentration of populations in major city centres, which tend to lean Democrat anyway.


6. In Hindsight...

Challenges

Government departments are usually notorious for having poorly maintained records, and the larger the government body, the worse condition their records will be in. I am pleased to report that NYC conforms to this stereotype perfectly. I severely underestimated the task of cleaning this dataset and formatting the addresses into usable geodata.


Programming Techniques and References

This project gave me an appreciation for how powerful pandas is as a data analysis tool. The ability to break the data down into small manageable objects using pivot tables and multi-indexing was invaluable for some of the more complex analyses I attempted. I have a different relationship with regular expressions.

Some people, when confronted with a problem, think "I know, I'll use regular expressions".
Now they have two problems.
How about ten thousand problems. For every address I successfully repaired using regular expressions, I'm almost certain I broke 2 more. This is more an inditement on the state of NYC's address data, but I struggled so hard with getting the addresses into a format Nominatim could process. The amount of times I surveyed the Global Donation map to see a 'Brooklyn NSW' or a '21st Street, Dubai' almost drove me to madness. What irritaited me more about it is that I can't conclusively prove those addresses weren't the origin of those donations. If I were to attempt it again, I would ensure I used the Borough code (if it was filled in) to restrict addresses that may have fallen out of the New York area to remain in New York.

Though I'm still working to understand it, I really enjoyed producing maps with Folium. I was surprised at how easy it was to produce functional heat maps once I understood the examples. I will definitely be making use of it in the future and working to improve my skills with it.


Obtaining Address Data

As New York is a well-established US city with many ties to its colonial history in Britain and Europe, many names and conventions are not only borrowed from this history but used by other cities as well. I suspect that every donation that Nominatim marked as ‘Brooklyn, NSW’ was an incomplete filing of a New York address. I cannot conclusively say that Nominatim is wrong, but it seems more prudent to exclude this point from the geographical analysis. Nominatim is also incapable of recognising some aspects of an address, particularly the suburb. I decided to strip this out to avoid confusing it and this increased my usable geodata by a significant amount. I suspect that this also contributed to many data points being marked incorrectly, as I had removed an important element of the address Nominatim could filter on. Obtaining the geocode was a monumental task. I had more than 40,000 unique addresses across both datasets and Nominatim only permitted one request per second. This process took a total of two days accounting for initial failed attempts. If I were to do this project again, I would have included the BOROUGHCD parameter and converted that into a borough that Nominatim could recognise. I suspect this would reduce the number of addresses falling outside New York and the US more generally but would create no improvement on addresses where this code was not included.


Gaining Insights into the Data

In a dataset this poorly maintained, it is difficult to draw concrete conclusions. The data makes no reference to whether the candidate was elected to their office or not, which would have been useful to determine how the current government’s election campaign was financed. Columns I believed I could gain valuable insight from turned out to be useless. One example of this was the C_CODE column. This column is designed to be a declaration of what kind of entity is donating money. In reality, every donation filing either left this entry blank or declared themselves an independent (IND). This infers that no business donated money to any New York politicians at any level of government. While this is potentially true, I suspect the incomplete nature of the dataset to be the culprit.


Geo Analysis and its Limitations

The geographic analysis was the focus of my exploration of the data. I wanted to determine if there was any overlap between certain regions donating money and receiving contracts in return. As discussed previously, the address data obtained through Nominatim was at best unreliable. This made drawing concrete conclusions from these heat maps difficult. Due to the difficulties I had in determining exact addresses, I decided I could only trust addresses from within the US.

Folium as tool for geoanalysis is incredibly powerful, but I am not yet experienced enough to undertake an analysis that makes use of all its features. Simple functionality like defining an accurate legend escaped me, and I feel as though the Folium maps were more hacked together than naturally coded. I had originally intended to include other Folium plots and improve my existing ones, such as a heat map with a time series or including a highlight overlay on the choropleth map that displayed the amount donated by that state. Both of these were attempted but ultimately proved to complicated and too far beyond the scope of the original project.